Review: a gentle introduction to imputation of missing values.

نویسندگان

  • A Rogier T Donders
  • Geert J M G van der Heijden
  • Theo Stijnen
  • Karel G M Moons
چکیده

In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

چند رویکرد برخورد با مقادیر گمشده‌ متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی‌ بالینی

Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...

متن کامل

Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)

Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...

متن کامل

تحلیل مشاهدات گمشده در مطالعه اثر دوزهای مختلف مکمل ویتامین D بر مقاومت به انسولین در دوران بارداری

Introduction: The aim  of  this  study  was to impute missing data  and  to compare the effect  of  different doses of  vitamin D supplementation on  insulin resistance during  pregnancy. Methods: A clinical trial  study   was done on 104  women  with diabetes and gestational age less than 12 weeks between 1391 and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of clinical epidemiology

دوره 59 10  شماره 

صفحات  -

تاریخ انتشار 2006